Supervised Learning of Term Similarities
نویسندگان
چکیده
In this paper we present a method for the automatic discovery and tuning of term similarities. The method is based on the automatic extraction of significant patterns in which terms tend to appear. Beside that, we use lexical and functional similarities between terms to define a hybrid similarity measure as a linear combination of the three similarities. We then present a genetic algorithm approach to supervised learning of parameters that are used in this linear combination. We used a domain specific ontology to evaluate the generated similarity measures and set the direction of their convergence. The approach has been tested and evaluated in the domain of molecular biology.
منابع مشابه
Bidirectional Label Propagation over Graphs
Graph-Based label propagation algorithms are popular in the state-of-the-art semi-supervised learning research. The key idea underlying this algorithmic family is to enforce labeling consistency between any two examples with a positive similarity. However, negative similarities or dissimilarities are equivalently valuable in practice. To this end, we simultaneously leverage similarities and dis...
متن کاملEfficient Similarity Derived from Kernel-Based Transition Probability
Semi-supervised learning effectively integrates labeled and unlabeled samples for classification, and most of the methods are founded on the pair-wise similarities between the samples. In this paper, we propose methods to construct similarities from the probabilistic viewpoint, whilst the similarities have so far been formulated in a heuristic manner such as by k-NN. We first propose the kernel...
متن کاملA Note on Semi-Supervised Learning using Markov Random Fields
This paper describes conditional-probability training of Markov random fields using combinations of labeled and unlabeled data. We capture the similarities between instances learning the appropriate distance metric from the data. The likelihood model and several training procedures are presented.
متن کاملSemi-Supervised Learning Based Prediction of Musculoskeletal Disorder Risk
This study explores a semi-supervised classification approach using random forest as a base classifier to classify the low-back disorders (LBDs) risk associated with the industrial jobs. Semi-supervised classification approach uses unlabeled data together with the small number of labelled data to create a better classifier. The results obtained by the proposed approach are compared with those o...
متن کاملGraph-based Learning for Statistical Machine Translation
Current phrase-based statistical machine translation systems process each test sentence in isolation and do not enforce global consistency constraints, even though the test data is often internally consistent with respect to topic or style. We propose a new consistency model for machine translation in the form of a graph-based semi-supervised learning algorithm that exploits similarities betwee...
متن کامل